Systems Programming

Building a CPU Scheduling Simulator

A Deep Dive into Operating Systems

Author: Vedant Misra

Institution: Pennsylvania State University

Course: CMPSC 473 - Operating Systems

Project Duration: Fall 2024

Project Overview

A comprehensive discrete event simulation framework for implementing and comparing various CPU scheduling algorithms including FCFS, LCFS, SJF, and MLFQ. This project demonstrates deep understanding of operating systems concepts, event-driven programming, and performance analysis.

Key Achievements

  • Implemented 4 scheduling algorithms (FCFS, LCFS, SJF, MLFQ)
  • Built discrete event simulation engine
  • Generic linked list data structure with sorting
  • Event-driven callback architecture
  • Comprehensive testing with 40 test cases
  • Performance metrics and analysis

Technologies Used

C (C11) Make Python Testing Discrete Event Simulation Event-Driven Programming Data Structures Algorithm Design Linux/Unix

1. Introduction

In modern operating systems, the CPU scheduler is one of the most critical components, responsible for deciding which process gets to execute on the CPU at any given time. This project implements a discrete event simulator that models different CPU scheduling policies, allowing us to understand their behavior, trade-offs, and performance characteristics.

Project Objectives

  • Implement a discrete event simulation framework for CPU scheduling
  • Build four different scheduling algorithms: FCFS, LCFS, SJF, and MLFQ
  • Develop a generic linked list data structure to manage job queues
  • Compare scheduling policies based on job completion times
  • Master event-driven programming paradigms

2. Background: CPU Scheduling in Operating Systems

What is CPU Scheduling?

CPU scheduling is the process by which an operating system decides which process in the ready queue gets access to the CPU. The scheduler is invoked whenever:

  1. A process switches from running to waiting state
  2. A process switches from running to ready state (preemption)
  3. A process switches from waiting to ready
  4. A process terminates

Key Scheduling Metrics

Understanding scheduling requires familiarity with several key metrics:

┌─────────────────────────────────────────────────────────┐
│  Key Performance Metrics                                │
├─────────────────────────────────────────────────────────┤
│  • Turnaround Time = Completion Time - Arrival Time     │
│  • Response Time = First Run Time - Arrival Time        │
│  • Waiting Time = Turnaround Time - Burst Time          │
│  • Throughput = Number of processes completed per unit  │
│  • CPU Utilization = % of time CPU is busy              │
└─────────────────────────────────────────────────────────┘

The Scheduling Challenge

The ideal scheduler would:

  • Maximize CPU utilization (keep CPU busy)
  • Maximize throughput (complete more jobs)
  • Minimize turnaround time (jobs finish quickly)
  • Minimize waiting time (jobs don't wait long)
  • Ensure fairness (all jobs get a chance)

However, these goals often conflict with each other, making scheduling a fascinating optimization problem.

3. Discrete Event Simulation: The Foundation

What is Discrete Event Simulation (DES)?

Rather than continuously simulating every nanosecond of time, DES models a system as a sequence of discrete events. Time "jumps" from one event to the next.

Timeline Comparison:

Continuous Simulation:
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
0  1  2  3  4  5  6  7  8  9  ...
(Simulates every time unit)

Discrete Event Simulation:
├───────┼────┼──────────┼───────┼─────┤
0       7    11         17      23    30
  Event1  E2    Event3     E4    E5
(Only simulates when events occur)

Core Components of Our Simulator

Our discrete event simulator consists of:

  1. Event Queue: A sorted list of future events
  2. Simulator Clock: Tracks current simulated time
  3. Event Types: Job arrivals and job completions
  4. Callbacks: Functions executed when events occur
// Event structure
typedef struct {
    uint64_t timestamp;        // When the event occurs
    event_type_t type;         // ARRIVAL or COMPLETION
    uint64_t id;               // Unique event identifier
    event_callback callback;   // Function to call
    void* callbackData;        // Data passed to callback
} event_t;

How the Simulator Works

┌─────────────────────────────────────────────────────────┐
│  Discrete Event Simulation Loop                         │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  1. Initialize event queue with job arrivals            │
│  2. While event queue is not empty:                     │
│     a. Get earliest event from queue                    │
│     b. Advance simulation time to event time            │
│     c. Execute event callback                           │
│     d. Remove event from queue                          │
│  3. Simulation complete                                 │
│                                                          │
└─────────────────────────────────────────────────────────┘

Key Insight: Events can schedule new events! For example, a job arrival event schedules a job completion event, and a job completion event may schedule another job completion event for the next job.

4. Project Architecture

System Overview

The project is structured with clear separation of concerns:

┌─────────────────────────────────────────────────────────────┐
│                     System Architecture                      │
└─────────────────────────────────────────────────────────────┘

┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│  Trace File  │─────▶│  Simulator   │─────▶│ Output File  │
│  (Input)     │      │   Engine     │      │  (Results)   │
└──────────────┘      └──────┬───────┘      └──────────────┘
                             │
                    ┌────────┴────────┐
                    │                 │
            ┌───────▼────────┐ ┌─────▼──────────┐
            │   Scheduler    │ │  Linked List   │
            │   (Policy)     │ │ (Job Queue)    │
            └────────────────┘ └────────────────┘
                    │
        ┌───────────┼───────────┬───────────┐
        │           │           │           │
    ┌───▼───┐  ┌───▼───┐  ┌───▼───┐  ┌───▼───┐
    │ FCFS  │  │ LCFS  │  │  SJF  │  │ MLFQ  │
    └───────┘  └───────┘  └───────┘  └───────┘

File Organization

project/
├── Core Simulator
│   ├── simulator.h/c          # Event queue management
│   ├── scheduler.h/c          # Scheduler interface
│   └── main.c                 # Entry point
│
├── Data Structures
│   ├── linked_list.h/c        # Generic linked list
│   └── job.h                  # Job data structure
│
├── Scheduling Policies (My Implementation)
│   ├── schedulerFCFS.c        # First-Come-First-Served
│   ├── schedulerLCFS.c        # Last-Come-First-Served
│   ├── schedulerSJF.c         # Shortest Job First
│   └── schedulerMLFQ.c        # Multi-Level Feedback Queue
│
├── Testing
│   ├── traces/                # Test cases
│   ├── grade.py               # Automated testing
│   └── linked_list_test.c     # Unit tests
│
└── Build System
    └── Makefile

Module Responsibilities

  • Simulator: Manages event queue, advances time, invokes callbacks
  • Scheduler: Defines interface for scheduling policies
  • Linked List: Provides generic queue/priority queue functionality
  • Job: Encapsulates job information (arrival, duration, ID)
  • Policy Implementations: Implement specific scheduling algorithms

5. Data Structures: Building a Generic Linked List

Why a Generic Linked List?

A generic linked list is the perfect data structure for this project because:

  1. Dynamic Size: Job queues grow and shrink dynamically
  2. Efficient Operations: O(1) insertions at head/tail
  3. Sorted Support: Can maintain jobs in sorted order
  4. Type Agnostic: Works with any data type via void* pointers

Design Philosophy

// Node structure - doubly linked for bidirectional traversal
typedef struct list_node {
    struct list_node* next;    // Next node in list
    struct list_node* prev;    // Previous node in list
    void* data;                // Generic data pointer
} list_node_t;

// List structure - maintains head, tail, and metadata
typedef struct {
    list_node_t* head;         // First node
    list_node_t* tail;         // Last node
    size_t count;              // Number of nodes
    compare_fn compare;        // Comparison function for sorting
} list_t;

Comparison Functions for Sorted Insertion

The list can maintain sorted order using a user-defined comparison function:

// Comparison function type
// Returns: -1 if data1 < data2, 0 if equal, 1 if data1 > data2
typedef int (*compare_fn)(void* data1, void* data2);

// Example: Sort jobs by remaining time (for SJF)
static int compareJobs(void* a, void* b) {
    job_t* job1 = (job_t*)a;
    job_t* job2 = (job_t*)b;
    
    uint64_t time1 = jobGetRemainingTime(job1);
    uint64_t time2 = jobGetRemainingTime(job2);
    
    if (time1 == time2) {
        // Tie-breaker: use job ID
        uint64_t id1 = jobGetId(job1);
        uint64_t id2 = jobGetId(job2);
        return (id1 < id2) ? -1 : (id1 > id2) ? 1 : 0;
    }
    
    return (time1 < time2) ? -1 : 1;
}

Flexible Insertion Modes

Unsorted Insertion (compare = NULL):
Always insert at head → O(1) operation

   HEAD                              TAIL
    ↓                                 ↓
   [New] ←→ [Node1] ←→ [Node2] ←→ [Node3]

Sorted Insertion (compare provided):
Insert in correct position → O(n) operation (acceptable for this project)

   HEAD                              TAIL
    ↓                                 ↓
   [Node1] ←→ [New] ←→ [Node2] ←→ [Node3]
            (inserted here based on compare function)

6. Scheduling Algorithms Implementation

1. First-Come-First-Served (FCFS)

Algorithm Overview:

  • Jobs are executed in the order they arrive
  • Non-preemptive: once a job starts, it runs to completion
  • Fair but can suffer from the convoy effect
Timeline Example:

Job Arrivals:  J1(t=0,dur=3), J2(t=1,dur=1), J3(t=2,dur=2)

CPU: [====J1====][=J2=][==J3==]
Time: 0    1    2    3    4    5    6

Completion Times: J1→3, J2→4, J3→6
Avg Turnaround: ((3-0)+(4-1)+(6-2))/3 = 3.67

The Convoy Effect:

Scenario: Short jobs arrive after a long job

Jobs: J1(dur=10), J2(dur=1), J3(dur=1), J4(dur=1)

CPU: [==========J1==========][J2][J3][J4]
Time: 0                     10  11  12  13

J2, J3, J4 must wait for J1 to complete!
Average waiting time = (0+10+11+12)/4 = 8.25 units

2. Last-Come-First-Served (LCFS)

Algorithm Overview:

  • Most recently arrived job executes first
  • Non-preemptive: once started, runs to completion
  • Acts like a stack (LIFO)
Timeline Example:

Job Arrivals:  J1(t=0,dur=3), J2(t=1,dur=1), J3(t=2,dur=2)

CPU: [====J1====][=J3=][==J2==]
Time: 0    1    2    3    4    5    6

Note: J3 arrived last (at t=2) but runs before J2 (arrived at t=1)
Wait times are UNFAIR to early arrivals

3. Shortest Job First (SJF)

Algorithm Overview:

  • Always execute the job with shortest remaining time
  • Minimizes average waiting time
  • Requires knowing job durations in advance
  • Optimal non-preemptive scheduling algorithm
Better Example:
Jobs arrive together: J1(dur=5), J2(dur=2), J3(dur=3)

FCFS: J1→5, J2→7, J3→10, Avg=(5+7+10)/3=7.33
SJF:  J2→2, J3→5, J1→10, Avg=(2+5+10)/3=5.67 ✓ Better!

Why SJF is Optimal: To minimize average waiting time, execute shorter jobs first. Generalizes to n jobs: always pick shortest remaining job.

The Starvation Problem:

Scenario: Long job J1(dur=100) arrives at t=0
          Short jobs keep arriving: J2,J3,J4... (dur=1) at t=1,2,3...

Result: J1 NEVER runs! It keeps getting pushed back by shorter jobs.

Solution: Use aging or switch to preemptive SRTF

4. Multi-Level Feedback Queue (MLFQ)

Algorithm Overview:

MLFQ is one of the most sophisticated and widely-used scheduling algorithms. It attempts to:

  • Optimize turnaround time (like SJF) without knowing job durations
  • Minimize response time (like Round Robin) for interactive jobs
  • Adapt to job behavior dynamically

The MLFQ Rules

┌────────────────────────────────────────────────────────┐
│  MLFQ Basic Rules                                      │
├────────────────────────────────────────────────────────┤
│  Rule 1: If Priority(A) > Priority(B), A runs          │
│  Rule 2: If Priority(A) = Priority(B), RR between A,B  │
│  Rule 3: New jobs enter at highest priority            │
│  Rule 4: If job uses entire time slice:                │
│          → Demote to lower priority                    │
│  Rule 5: If job yields CPU before time slice:          │
│          → Keep same priority (not implemented here)   │
└────────────────────────────────────────────────────────┘

Visual Representation

Priority Levels (16 levels, 0 = highest):

Level 0: [────] ← New jobs, Interactive jobs (time slice = 1)
Level 1: [────]
Level 2: [────]
   ...
Level 15: [───────] ← CPU-intensive jobs (time slice = 1)

Job Behavior:
┌────────┐
│ New Job│ enters at Level 0
└───┬────┘
    │
    ├─→ Uses full time slice (1 unit)
    │   └─→ Demoted to Level 1
    │
    ├─→ Uses full time slice again
    │   └─→ Demoted to Level 2
    │
    └─→ Eventually settles at appropriate priority

How MLFQ Learns Job Behavior

Short/Interactive Jobs:
Job: Short burst (2 units total)
├─ t=0: Arrives at Q0, runs 1 unit
├─ t=1: Demoted to Q1, runs 1 unit
└─ t=2: Completes ✓
Result: Completes quickly (low turnaround time)

Long/CPU-bound Jobs:
Job: Long burst (20 units total)
├─ t=0: Arrives at Q0, runs 1 unit
├─ t=1: Demoted to Q1, runs 1 unit
├─ t=2: Demoted to Q2, runs 1 unit
│   ...keeps getting demoted...
├─ t=15: At Q15 (lowest priority)
└─ Runs when no higher priority jobs exist
Result: Doesn't block short jobs (good response time for others)

7. Event-Driven Programming Model

The Callback Pattern

Our simulator uses callbacks extensively. This is a function pointer that gets invoked at event time.

// Callback type definition
typedef void (*event_callback)(void* callbackData);

// Scheduling an event
list_node_t* simulatorSchedule(simulator_t* sim, uint64_t timestamp, 
                               event_type_t type, 
                               event_callback callback, 
                               void* callbackData);

Event Flow Diagram

┌────────────────────────────────────────────────────────┐
│  Event Flow Example: Job Arrival and Completion        │
└────────────────────────────────────────────────────────┘

1. Trace Reader schedules job arrivals
   │
   └─→ simulatorSchedule(sim, arrivalTime, EVENT_ARRIVAL, 
                         schedulerScheduleJob, job)

2. Simulator advances time to arrivalTime
   │
   └─→ Invokes: schedulerScheduleJob(job)

3. Scheduler handles arrival
   │
   ├─→ Adds job to queue
   └─→ Calls: schedulerScheduleNextCompletion(completionTime)
              │
              └─→ simulatorSchedule(sim, completionTime, 
                                   EVENT_COMPLETION, 
                                   schedulerCompleteJob, scheduler)

4. Simulator advances time to completionTime
   │
   └─→ Invokes: schedulerCompleteJob()

5. Scheduler handles completion
   │
   ├─→ Removes job from queue
   └─→ May schedule next completion (if more jobs exist)

Key Insight: One Completion at a Time

❌ WRONG: Schedule all completions immediately
├─ Job 1 arrives → schedule completion at t=10
├─ Job 2 arrives → schedule completion at t=15
└─ Job 3 arrives → schedule completion at t=20

Why wrong? What if Job 4 (shorter) arrives at t=5?
With SJF, the completions at t=15 and t=20 are invalid!

✓ CORRECT: Only schedule next completion
├─ Job 1 arrives → schedule completion at t=10
├─ Job 2 arrives → (don't schedule completion yet)
├─ Job 3 arrives → (don't schedule completion yet)
├─ t=10: Job 1 completes → NOW schedule Job 2's completion
└─ t=15: Job 2 completes → NOW schedule Job 3's completion

8. Testing and Validation

Test Framework

The project uses a comprehensive testing framework with 40 test cases (10 per scheduler):

Trace Files (Input):
┌──────────────────────────────┐
│ JobID, ArrivalTime, Duration │
├──────────────────────────────┤
│ 1,0,5                        │
│ 2,1,3                        │
│ 3,2,1                        │
└──────────────────────────────┘

Expected Output:
┌──────────────────────────────┐
│ JobID, CompletionTime        │
├──────────────────────────────┤
│ 1,5                          │
│ 2,8                          │
│ 3,6                          │
└──────────────────────────────┘

Test Result: diff shows no differences ✓ PASS

Running Tests

# Compile the project
make

# Run a single test
./simulator traces/FCFS_1.csv traces/FCFS_1.csv.out FCFS

# Compare with expected output
diff traces/FCFS_1.csv.out traces/FCFS_1.csv.expected

# Run all tests automatically
make test

Test Coverage

  • FCFS: Simple to complex arrival patterns
  • LCFS: Stack behavior, ordering verification
  • SJF: Tie-breaking, priority ordering
  • MLFQ: Preemption, multi-level queues, time slices

9. Key Insights and Learning Outcomes

1. Event-Driven Programming Mindset

Traditional Programming:
// Sequential thinking
for (int i = 0; i < numJobs; i++) {
    process(jobs[i]);
}

Event-Driven Programming:
// React to events
void onJobArrival(job_t* job) {
    // Only handle THIS arrival
    // Future events will trigger their own callbacks
}

Key Insight: Don't try to control the entire flow. React to individual events and let the simulator orchestrate the rest.

2. Trade-offs in Scheduling

┌────────────────────────────────────────────────────┐
│  Scheduler Comparison                              │
├────────────────────────────────────────────────────┤
│                                                    │
│  FCFS: Simple, fair, but suffers convoy effect     │
│        → Good for: Batch processing                │
│                                                    │
│  LCFS: Simple, unfair, poor turnaround             │
│        → Good for: Stack-based workflows (rare)    │
│                                                    │
│  SJF:  Optimal avg wait, but starves long jobs     │
│        → Good for: Known job durations             │
│                                                    │
│  MLFQ: Adaptive, good all-around, complex          │
│        → Good for: General-purpose OS              │
│                                                    │
└────────────────────────────────────────────────────┘

3. The Power of Generic Data Structures

Using void* pointers made the linked list reusable:

  • Job queues (storing job_t*)
  • Event queues (storing event_t*)
  • Priority queues (with compare function)
  • FIFO queues (without compare function)

Trade-off: Type safety vs. flexibility. Advantage: Single implementation, many uses. Disadvantage: No compile-time type checking.

4. Separation of Concerns

The architecture cleanly separates:

  • Simulator: Time management, event ordering
  • Scheduler Interface: Common operations for all policies
  • Policy Implementation: Specific scheduling logic
  • Data Structures: Generic reusable components

Benefit: Can easily add new scheduling policies without modifying simulator.

5. Testing is Essential

With 40 test cases (10 per scheduler), testing caught:

  • Off-by-one errors in list traversal
  • Incorrect completion time calculations
  • Memory leaks from forgotten frees
  • Edge cases (empty queue, single job, simultaneous arrivals)

10. Conclusion

Project Summary

This CPU scheduling simulator project demonstrates:

  • Systems Programming: Low-level C programming with manual memory management
  • Operating Systems Concepts: Deep understanding of scheduling algorithms
  • Software Architecture: Clean separation of concerns, modular design
  • Event-Driven Programming: Callback-based control flow
  • Data Structures: Generic linked list with sorted insertion
  • Testing & Validation: Comprehensive test suite with automated grading

Real-World Applications

The concepts learned apply to:

  • Operating System Development: Linux, Windows scheduler implementation
  • Network Packet Scheduling: QoS, traffic shaping
  • Database Query Optimization: Query scheduling and execution
  • Cloud Resource Management: VM/container scheduling (Kubernetes)
  • Real-Time Systems: Task scheduling with deadlines

Key Takeaways

  1. Scheduling is about trade-offs: No single algorithm is best for all workloads
  2. Event-driven programming requires a different mindset: React to events rather than controlling flow
  3. Generic data structures are powerful: Same list serves multiple purposes
  4. Simulation is a valuable learning tool: Understand system behavior without building real OS
  5. Testing is not optional: Complex systems require comprehensive validation

Future Enhancements

Possible extensions to this project:

  • Shortest Remaining Time First (SRTF) - Preemptive version of SJF
  • Priority Scheduling with explicit job priorities
  • Fair Share Scheduling to ensure fairness across users
  • Real-Time Scheduling with Earliest Deadline First (EDF)
  • Multi-Core Scheduling with multiple CPUs and load balancing
  • Performance Metrics visualization for turnaround time, wait time, throughput
  • GUI Visualizer to see jobs moving through queues in real-time